In your final repo, there should be an R markdown file that organizes all computational steps for evaluating your proposed Facial Expression Recognition framework.
This file is currently a template for running evaluation experiments. You should update it according to your codes but following precisely the same structure.
set.seed(2020)
# setwd(\~/Project3-FacialEmotionRecognition/doc\)
# here replace it with your own path or manually set it in RStudio to where this rmd file is located.
# use relative path for reproducibility
New libraries
```r
train_dir <- \../data/train_set/\ # This will be modified for different data sets.
train_image_dir <- paste(train_dir, \images/\, sep=\\)
train_pt_dir <- paste(train_dir, \points/\, sep=\\)
train_label_path <- paste(train_dir, \label.csv\, sep=\\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
### Step 0 set work directories
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjpbInNldC5zZWVkKDIwMjApIiwic2V0d2QoXCIuLi9kb2NcIikiLCIjIGhlcmUgcmVwbGFjZSBpdCB3aXRoIHlvdXIgb3duIHBhdGggb3IgbWFudWFsbHkgc2V0IGl0IGluIFJTdHVkaW8gdG8gd2hlcmUgdGhpcyBybWQgZmlsZSBpcyBsb2NhdGVkLiAiLCIjIHVzZSByZWxhdGl2ZSBwYXRoIGZvciByZXByb2R1Y2liaWxpdHkiXX0= -->
```r
set.seed(2020)
setwd("../doc")
# here replace it with your own path or manually set it in RStudio to where this rmd file is located.
# use relative path for reproducibility
Provide directories for training images. Training images and Training fiducial points will be in different subfolders.
train_dir <- "../data/train_set/" # This will be modified for different data sets.
train_image_dir <- paste(train_dir, "images/", sep="")
train_pt_dir <- paste(train_dir, "points/", sep="")
train_label_path <- paste(train_dir, "label.csv", sep="")
In this chunk, we have a set of controls for the evaluation experiments.
run.cv <- TRUE # run cross-validation on the training set
sample.reweight <- FALSE # run sample reweighting in model training
K <- 5 # number of CV folds
run.feature.train <- TRUE # process features for training set
run.test <- TRUE # run evaluation on an independent test set
# gbm
gbm.numtrees <- 1000 #number of trees to use in gbm
run.feature.test <- TRUE # process features for test set
run.poly.feature <- TRUE # process poly features
run.add.poly.feature <- TRUE # and poly features to dist matrix
run.gbm <- TRUE
# svm
run.svm <- TRUE # svm is the chosen advance model
needs.balanced <- TRUE # balance data for model fitting
model.selection <- TRUE # perform model selection on svm models
# random forest
run.balanced.data <- TRUE # Whether or not balance the data
train.random.forest <- F # Train Random Forest Model
tune.random.forest <- F # Tune Random Forest Model
Using cross-validation or independent test set evaluation, we compare the performance of models with different specifications. In this Starter Code, we tune parameter lambda (the amount of shrinkage) for logistic regression with LASSO penalty.
lmbd = c(1e-3, 5e-3, 1e-2, 5e-2, 1e-1)
model_labels = paste("LASSO Penalty with lambda =", lmbd)
#train-test split
info <- read.csv(train_label_path)
n <- nrow(info) #get number of rows from csv
n_train <- round(n*(4/5), 0) #use 4/5 amount of data for training
train_idx <- sample(info$Index, n_train, replace = F) #grab indexs used for training
test_idx <- setdiff(info$Index, train_idx) # get indexs not used for training
If you choose to extract features from images, such as using Gabor filter, R memory will exhaust all images are read together. The solution is to repeat reading a smaller batch(e.g 100) and process them.
n_files <- length(list.files(train_image_dir,'*jpg'))
# image_list <- list()
# for(i in 1:100){
# image_list[[i]] <- readImage(paste0(train_image_dir, sprintf("%04d", i), ".jpg"))
# }
Fiducial points are stored in matlab format. In this step, we read them and store them in a list.
#function to read fiducial points
#input: index
#output: matrix of fiducial points corresponding to the index
readMat.matrix <- function(index){
return(round(readMat(paste0(train_pt_dir, sprintf("%04d", index), ".mat"))[[1]],0))
}
#load fiducial points
fiducial_pt_list <- lapply(1:n_files, readMat.matrix)
save(fiducial_pt_list, file="../output/fiducial_pt_list.RData")
The follow plots show how pairwise distance between fiducial points can work as feature for facial emotion recognition.
Figure1
feature.R should be the wrapper for all your feature engineering functions and options. The function feature( ) should have options that correspond to different scenarios for your project and produces an R object that contains features and responses that are required by all the models you are going to evaluate later.
feature.R```r
res_cv <- as.data.frame(res_cv)
colnames(res_cv) <- c(\mean_error\, \sd_error\, \mean_AUC\, \sd_AUC\)
res_cv$k = as.factor(lmbd)
if(run.cv){
p1 <- res_cv %>%
ggplot(aes(x = as.factor(lmbd), y = mean_error,
ymin = mean_error - sd_error, ymax = mean_error + sd_error)) +
geom_crossbar() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
p2 <- res_cv %>%
ggplot(aes(x = as.factor(lmbd), y = mean_AUC,
ymin = mean_AUC - sd_AUC, ymax = mean_AUC + sd_AUC)) +
geom_crossbar() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
print(p1)
print(p2)
}
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
### Step 4: Train a classification model with training features and responses
Call the train model and test model from library.
`train.R` and `test.R` should be wrappers for all your model training steps and your classification/prediction steps.
+ `train.R`
+ Input: a data frame containing features and labels and a parameter list.
+ Output:a trained model
+ `test.R`
+ Input: the fitted classification model using training data and processed features from testing images
+ Input: an R object that contains a trained classifier.
+ Output: training model specification
+ In this Starter Code, we use logistic regression with LASSO penalty to do classification.
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuc291cmNlKFwiLi4vbGliL3RyYWluLlJcIikgXG5zb3VyY2UoXCIuLi9saWIvdGVzdC5SXCIpXG5gYGAifQ== -->
```r
source("../lib/train.R")
source("../lib/test.R")
source("../lib/cross_validation.R")
feature_train = as.matrix(dat_train[, 1:ncol(dat_train)-1])
label_train = as.integer(dat_train$label)
if(run.cv){
res_cv <- matrix(0, nrow = length(lmbd), ncol = 4)
for(i in 1:length(lmbd)){
cat("lambda = ", lmbd[i], "\n")
res_cv[i,] <- cv.function(features = feature_train, labels = label_train, K,
l = lmbd[i], reweight = sample.reweight)
save(res_cv, file="../output/res_cv.RData")
}
}else{
load("../output/res_cv.RData")
}
Visualize cross-validation results.
```r
cat(\Time for constructing training features=\, tm_feature_train[1], \s \n\)
cat(\Time for constructing testing features=\, tm_feature_test[1], \s \n\)
cat(\Time for training model=\, tm_train[1], \s \n\)
cat(\Time for testing model=\, tm_test[1], \s \n\)
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
* Choose the "best" parameter value
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuI3Bhcl9iZXN0IDwtIGxtYmRbd2hpY2gubWluKHJlc19jdl9yZiRtZWFuX2Vycm9yKV0gIyBsbWJkW3doaWNoLm1heChyZXNfY3YkbWVhbl9BVUMpXVxuYGBgIn0= -->
```r
#par_best <- lmbd[which.min(res_cv_rf$mean_error)] # lmbd[which.max(res_cv$mean_AUC)]
Create weight test
label_test <- as.integer(dat_test$label)
weight_test <- rep(NA, length(label_test))
for (i in unique(label_test)){
weight_test[label_test == i] = 0.5 * length(label_test) / length(label_test[label_test == i])
}
# training weights
weight_train <- rep(NA, length(label_train))
for (v in unique(label_train)){
weight_train[label_train == v] = 0.5 * length(label_train) / length(label_train[label_train == v])
}
if (run.gbm){
if (sample.reweight){
tm_train <- system.time(fit_train <- train_gbm(dat_train, s=0.1, K=K, n=gbm.numtrees,w = weight_train))
} else {
tm_train <- system.time(fit_train <- train_gbm(dat_train, s=0.1, K=K, n=gbm.numtrees,w = NULL))
}
# plot the performance
best.iter.oob <- gbm.perf(fit_train,method="OOB") # returns out-of-bag estimated best number of trees
print(best.iter.oob)
best.iter.cv <- gbm.perf(fit_train,method="cv") # returns K-fold cv estimate of best number of trees
print(best.iter.cv)
} else {
if (sample.reweight){
tm_train <- system.time(fit_train <- train(feature_train, label_train, w = weight_train, par_best))
} else {
tm_train <- system.time(fit_train <- train(feature_train, label_train, w = NULL, par_best))
}
}
save(fit_train, file="../output/fit_train.RData")
```r
library(e1071)
tm_svm_default_mod <- NA
tm_svm_linear_cost <-NA
tm_svm_linear_mod <- NA
if(model.selection){
svm_model_auc <- rep(NA, 2)
# default model
if(run.cv){
tm_svm_default_mod < system.time(svm_default_mod <- svm_default_train(svm_training_data, K))
save(svm_default_mod, file=\../output/svm_default_mod.RData\)
} else {
load(file=\../output/svm_default_mod.RData\)
}
svm_default_pred <- svm_test(svm_default_mod, svm_training_data)
#mean(round(svm_default_pred == svm_training_data$label))
tpr.fpr_default <- WeightedROC(as.numeric(svm_default_pred), svm_training_data$label)
svm_model_auc[1] <- WeightedAUC(tpr.fpr_default)
# linear kernel
if(run.cv){
tm_svm_linear_cost < system.time(best.cost <- svm.cv.linear(svm_training_data, K))
tm_svm_linear_mod <- system.time(svm_linear_mod <- svm_linear_train(svm_training_data, best.cost, K))
save(svm_linear_mod, file=\../output/svm_linear_mod.RData\)
} else {
load(file=\../output/svm_linear_mod.RData\)
}
svm_linear_pred <- svm_test(svm_linear_mod, svm_training_data)
#mean(round(svm_linear_pred == svm_training_data$label))
tpr.fpr_linear <- WeightedROC(as.numeric(svm_linear_pred), svm_training_data$label)
svm_model_auc[2] <- WeightedAUC(tpr.fpr_linear)
# select model with the highest auc
curr_best_auc <- which.max(svm_model_auc)
if(curr_best_auc == 1){
svm_best_mod <- svm_default_mod
save(svm_best_mod, file=\../output/svm_best_mod.RData\)
} else{
svm_best_mod <- svm_linear_mod
save(svm_best_mod, file=\../output/svm_best_mod.RData\)
}
} else{
load(file=\../output/svm_best_mod.RData\)
}
<!-- rnb-source-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
Random Forest:
## Tune RF
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuYGBgclxuc3ZtX2F1Y1xuXG5gYGBcbmBgYCJ9 -->
```r
```r
svm_auc
<!-- rnb-source-end -->
<!-- rnb-output-begin eyJkYXRhIjoiWzFdIDAuNjIxNzA4OFxuIn0= -->
[1] 0.6217088
<!-- rnb-output-end -->
<!-- rnb-chunk-end -->
<!-- rnb-text-begin -->
mtry = 154 is the best.
## Find the best ntrees
<!-- rnb-text-end -->
<!-- rnb-chunk-begin -->
<!-- rnb-source-begin eyJkYXRhIjoiYGBgclxuc291cmNlKFwiLi4vbGliL3JhbmRvbV9mb3Jlc3QuUlwiKVxuXG4jVHJhaW4gNTAwXG5pZih0dW5lLnJhbmRvbS5mb3Jlc3Qpe1xudGltZS5yZi50cmFpbiA8LSBzeXN0ZW0udGltZShyYW5kb21fZm9yZXN0X2ZpdF81MDAgPC0gcmFuZG9tX2ZvcmVzdF90cmFpbl81MDAoZGF0X3RyYWluX2JhbGFuY2VkX3Jvc2UsbXRyeSA9IDE1NCkpXG5zYXZlKHJhbmRvbV9mb3Jlc3RfZml0XzUwMCwgZmlsZSA9IFwiLi4vb3V0cHV0L3JmX3RyYWluXzUwMF90cmVlcy5SRGF0YVwiKVxufVxuI1Rlc3QgNTAwXG5yYW5kb21fZm9yZXN0X3Rlc3RfcHJlcD1OQVxuaWYodHVuZS5yYW5kb20uZm9yZXN0KXtcbiBsb2FkKGZpbGU9XCIuLi9vdXRwdXQvcmZfdHJhaW5fNTAwX3RyZWVzLlJEYXRhXCIpXG4gdGltZS5yZi50ZXN0IDwtIHN5c3RlbS50aW1lKFxuICAgcmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXAgPC0gcmFuZG9tX2ZvcmVzdF90ZXN0KFxuICAgICBtb2RlbCA9IHJhbmRvbV9mb3Jlc3RfZml0XzUwMCx0ZXN0c2V0ID0gZGF0X3Rlc3QpXG4gICAgICAgICAgICAgICApXG5cbnJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwIDwtIGFzLm51bWVyaWMoYXMuY2hhcmFjdGVyKHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwKSlcbmFjY3VfcmZfdGVzdCA8LSBtZWFuKHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwID09IGRhdF90ZXN0JGxhYmVsKVxucmFuZG9tX2ZvcmVzdF9sYWJlbDwtcm91bmQocmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXApXG5hY2N1X3JmIDwtIHN1bSh3ZWlnaHRfdGVzdCAqIChyYW5kb21fZm9yZXN0X2xhYmVsID09IGxhYmVsX3Rlc3QpKSAvIHN1bSh3ZWlnaHRfdGVzdClcbiNwcm9iX3ByZWQgPC0gbGFibGVfcHJlZFxudHByLmZwciA8LSBXZWlnaHRlZFJPQyhyYW5kb21fZm9yZXN0X3Rlc3RfcHJlcCwgbGFiZWxfdGVzdCwgd2VpZ2h0X3Rlc3QpXG5hdWNfcmYgPC0gV2VpZ2h0ZWRBVUModHByLmZwcilcbmNhdChcIlRoZSBBVUMgb2YgbW9kZWwgYWZ0ZXIgcmV3ZWlnaHRpbmc6IFJGXCIsIFwiaXNcIiwgYXVjX3JmLCBcIi5cXG5cIilcbmNhdChcIlRoZSBhY2N1cmFjeSBvZiBtb2RlbDogUmFuZG9tIEZvcmVzdCBvbiBpbWJhbGFuY2VkIHRlc3RpbmcgZGF0YVwiLCBcImlzXCIsIGFjY3VfcmZfdGVzdCoxMDAsIFwiJS5cXG5cIilcbmNhdChcIlRoZSBhY2N1cmFjeSBvZiBtb2RlbDogUmFuZG9tIEZvcmVzdCBvbiBiYWxhbmNlZCB0ZXN0aW5nIGRhdGFcIiwgXCJpc1wiLCBhY2N1X3JmKjEwMCwgXCIlLlxcblwiKVxuY2F0KFwiVGltZSBmb3IgdHJhaW5pbmcgbW9kZWwgUmFuZG9tIEZvcmVzdCA9IFwiLCB0aW1lLnJmLnRyYWluWzFdLCBcInMgXFxuXCIpXG5jYXQoXCJUaW1lIGZvciB0ZXN0aW5nIG1vZGVsIFJhbmRvbSBGb3Jlc3QgPSBcIix0aW1lLnJmLnRlc3RbMV0sIFwicyBcXG5cIilcbn1cbiMgVGhlIEFVQyBvZiBtb2RlbCBhZnRlciByZXdlaWdodGluZzogUkYgaXMgMC41MDMxOTk5IC5cbiMgVGhlIGFjY3VyYWN5IG9mIG1vZGVsOiBSYW5kb20gRm9yZXN0IG9uIGltYmFsYW5jZWQgdGVzdGluZyBkYXRhIGlzIDgwLjMzMzMzICUuXG4jIFRoZSBhY2N1cmFjeSBvZiBtb2RlbDogUmFuZG9tIEZvcmVzdCBvbiBiYWxhbmNlZCB0ZXN0aW5nIGRhdGEgaXMgNTAuMzE5OTkgJS5cbiMgVGltZSBmb3IgdHJhaW5pbmcgbW9kZWwgUmFuZG9tIEZvcmVzdCA9ICAyMC45NSBzIFxuIyBUaW1lIGZvciB0ZXN0aW5nIG1vZGVsIFJhbmRvbSBGb3Jlc3QgPSAgMC4wOSBzIFxuXG4jVHJhaW4gMTAwMFxuaWYodHVuZS5yYW5kb20uZm9yZXN0KXtcbnRpbWUucmYudHJhaW4gPC0gc3lzdGVtLnRpbWUocmFuZG9tX2ZvcmVzdF9maXRfMTAwMCA8LSByYW5kb21fZm9yZXN0X3RyYWluXzEwMDAoZGF0X3RyYWluX2JhbGFuY2VkX3Jvc2UsbXRyeSA9IDE1NCkpXG5zYXZlKHJhbmRvbV9mb3Jlc3RfZml0XzEwMDAsIGZpbGUgPSBcIi4uL291dHB1dC9yZl90cmFpbl8xMDAwX3RyZWVzLlJEYXRhXCIpXG59XG4jVGVzdCAxMDAwXG5yYW5kb21fZm9yZXN0X3Rlc3RfcHJlcD1OQVxuaWYodHVuZS5yYW5kb20uZm9yZXN0KXtcbiBsb2FkKGZpbGU9XCIuLi9vdXRwdXQvcmZfdHJhaW5fMTAwMF90cmVlcy5SRGF0YVwiKVxuIHRpbWUucmYudGVzdCA8LSBzeXN0ZW0udGltZShcbiAgIHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwIDwtIHJhbmRvbV9mb3Jlc3RfdGVzdChcbiAgICAgbW9kZWwgPSByYW5kb21fZm9yZXN0X2ZpdF8xMDAwLHRlc3RzZXQgPSBkYXRfdGVzdClcbiAgICAgICAgICAgICAgIClcblxucmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXAgPC0gYXMubnVtZXJpYyhhcy5jaGFyYWN0ZXIocmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXApKVxuYWNjdV9yZl90ZXN0IDwtIG1lYW4ocmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXAgPT0gZGF0X3Rlc3QkbGFiZWwpXG5yYW5kb21fZm9yZXN0X2xhYmVsPC1yb3VuZChyYW5kb21fZm9yZXN0X3Rlc3RfcHJlcClcbmFjY3VfcmYgPC0gc3VtKHdlaWdodF90ZXN0ICogKHJhbmRvbV9mb3Jlc3RfbGFiZWwgPT0gbGFiZWxfdGVzdCkpIC8gc3VtKHdlaWdodF90ZXN0KVxuI3Byb2JfcHJlZCA8LSBsYWJsZV9wcmVkXG50cHIuZnByIDwtIFdlaWdodGVkUk9DKHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwLCBsYWJlbF90ZXN0LCB3ZWlnaHRfdGVzdClcbmF1Y19yZiA8LSBXZWlnaHRlZEFVQyh0cHIuZnByKVxuY2F0KFwiVGhlIEFVQyBvZiBtb2RlbCBhZnRlciByZXdlaWdodGluZzogUkZcIiwgXCJpc1wiLCBhdWNfcmYsIFwiLlxcblwiKVxuY2F0KFwiVGhlIGFjY3VyYWN5IG9mIG1vZGVsOiBSYW5kb20gRm9yZXN0IG9uIGltYmFsYW5jZWQgdGVzdGluZyBkYXRhXCIsIFwiaXNcIiwgYWNjdV9yZl90ZXN0KjEwMCwgXCIlLlxcblwiKVxuY2F0KFwiVGhlIGFjY3VyYWN5IG9mIG1vZGVsOiBSYW5kb20gRm9yZXN0IG9uIGJhbGFuY2VkIHRlc3RpbmcgZGF0YVwiLCBcImlzXCIsIGFjY3VfcmYqMTAwLCBcIiUuXFxuXCIpXG5jYXQoXCJUaW1lIGZvciB0cmFpbmluZyBtb2RlbCBSYW5kb20gRm9yZXN0ID0gXCIsIHRpbWUucmYudHJhaW5bMV0sIFwicyBcXG5cIilcbmNhdChcIlRpbWUgZm9yIHRlc3RpbmcgbW9kZWwgUmFuZG9tIEZvcmVzdCA9IFwiLHRpbWUucmYudGVzdFsxXSwgXCJzIFxcblwiKVxufVxuXG4jVHJhaW4gMTUwMFxuaWYodHVuZS5yYW5kb20uZm9yZXN0KXtcbnRpbWUucmYudHJhaW4gPC0gc3lzdGVtLnRpbWUocmFuZG9tX2ZvcmVzdF9maXRfMTUwMCA8LSByYW5kb21fZm9yZXN0X3RyYWluXzE1MDAoZGF0X3RyYWluX2JhbGFuY2VkX3Jvc2UsbXRyeSA9IDE1NCkpXG5zYXZlKHJhbmRvbV9mb3Jlc3RfZml0XzE1MDAsIGZpbGUgPSBcIi4uL291dHB1dC9yZl90cmFpbl8xNTAwX3RyZWVzLlJEYXRhXCIpXG59XG4jVGVzdCAxNTAwXG5yYW5kb21fZm9yZXN0X3Rlc3RfcHJlcD1OQVxuaWYodHVuZS5yYW5kb20uZm9yZXN0KXtcbiBsb2FkKGZpbGU9XCIuLi9vdXRwdXQvcmZfdHJhaW5fMTUwMF90cmVlcy5SRGF0YVwiKVxuIHRpbWUucmYudGVzdCA8LSBzeXN0ZW0udGltZShcbiAgIHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwIDwtIHJhbmRvbV9mb3Jlc3RfdGVzdChcbiAgICAgbW9kZWwgPSByYW5kb21fZm9yZXN0X2ZpdF8xNTAwLHRlc3RzZXQgPSBkYXRfdGVzdClcbiAgICAgICAgICAgICAgIClcblxucmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXAgPC0gYXMubnVtZXJpYyhhcy5jaGFyYWN0ZXIocmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXApKVxuYWNjdV9yZl90ZXN0IDwtIG1lYW4ocmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXAgPT0gZGF0X3Rlc3QkbGFiZWwpXG5yYW5kb21fZm9yZXN0X2xhYmVsPC1yb3VuZChyYW5kb21fZm9yZXN0X3Rlc3RfcHJlcClcbmFjY3VfcmYgPC0gc3VtKHdlaWdodF90ZXN0ICogKHJhbmRvbV9mb3Jlc3RfbGFiZWwgPT0gbGFiZWxfdGVzdCkpIC8gc3VtKHdlaWdodF90ZXN0KVxuI3Byb2JfcHJlZCA8LSBsYWJsZV9wcmVkXG50cHIuZnByIDwtIFdlaWdodGVkUk9DKHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwLCBsYWJlbF90ZXN0LCB3ZWlnaHRfdGVzdClcbmF1Y19yZiA8LSBXZWlnaHRlZEFVQyh0cHIuZnByKVxuY2F0KFwiVGhlIEFVQyBvZiBtb2RlbCBhZnRlciByZXdlaWdodGluZzogUkZcIiwgXCJpc1wiLCBhdWNfcmYsIFwiLlxcblwiKVxuY2F0KFwiVGhlIGFjY3VyYWN5IG9mIG1vZGVsOiBSYW5kb20gRm9yZXN0IG9uIGltYmFsYW5jZWQgdGVzdGluZyBkYXRhXCIsIFwiaXNcIiwgYWNjdV9yZl90ZXN0KjEwMCwgXCIlLlxcblwiKVxuY2F0KFwiVGhlIGFjY3VyYWN5IG9mIG1vZGVsOiBSYW5kb20gRm9yZXN0IG9uIGJhbGFuY2VkIHRlc3RpbmcgZGF0YVwiLCBcImlzXCIsIGFjY3VfcmYqMTAwLCBcIiUuXFxuXCIpXG5jYXQoXCJUaW1lIGZvciB0cmFpbmluZyBtb2RlbCBSYW5kb20gRm9yZXN0ID0gXCIsIHRpbWUucmYudHJhaW5bMV0sIFwicyBcXG5cIilcbmNhdChcIlRpbWUgZm9yIHRlc3RpbmcgbW9kZWwgUmFuZG9tIEZvcmVzdCA9IFwiLHRpbWUucmYudGVzdFsxXSwgXCJzIFxcblwiKVxufVxuXG4jVHJhaW4gMjAwMFxuaWYodHVuZS5yYW5kb20uZm9yZXN0KXtcbnRpbWUucmYudHJhaW4gPC0gc3lzdGVtLnRpbWUocmFuZG9tX2ZvcmVzdF9maXRfMjAwMCA8LSByYW5kb21fZm9yZXN0X3RyYWluXzIwMDAoZGF0X3RyYWluX2JhbGFuY2VkX3Jvc2UsbXRyeSA9IDE1NCkpXG5zYXZlKHJhbmRvbV9mb3Jlc3RfZml0XzIwMDAsIGZpbGUgPSBcIi4uL291dHB1dC9yZl90cmFpbl8yMDAwX3RyZWVzLlJEYXRhXCIpXG59XG4jVGVzdCAyMDAwXG5yYW5kb21fZm9yZXN0X3Rlc3RfcHJlcD1OQVxuaWYodHVuZS5yYW5kb20uZm9yZXN0KXtcbiBsb2FkKGZpbGU9XCIuLi9vdXRwdXQvcmZfdHJhaW5fMjAwMF90cmVlcy5SRGF0YVwiKVxuIHRpbWUucmYudGVzdCA8LSBzeXN0ZW0udGltZShcbiAgIHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwIDwtIHJhbmRvbV9mb3Jlc3RfdGVzdChcbiAgICAgbW9kZWwgPSByYW5kb21fZm9yZXN0X2ZpdF8yMDAwLHRlc3RzZXQgPSBkYXRfdGVzdClcbiAgICAgICAgICAgICAgIClcblxucmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXAgPC0gYXMubnVtZXJpYyhhcy5jaGFyYWN0ZXIocmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXApKVxuYWNjdV9yZl90ZXN0IDwtIG1lYW4ocmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXAgPT0gZGF0X3Rlc3QkbGFiZWwpXG5yYW5kb21fZm9yZXN0X2xhYmVsPC1yb3VuZChyYW5kb21fZm9yZXN0X3Rlc3RfcHJlcClcbmFjY3VfcmYgPC0gc3VtKHdlaWdodF90ZXN0ICogKHJhbmRvbV9mb3Jlc3RfbGFiZWwgPT0gbGFiZWxfdGVzdCkpIC8gc3VtKHdlaWdodF90ZXN0KVxuI3Byb2JfcHJlZCA8LSBsYWJsZV9wcmVkXG50cHIuZnByIDwtIFdlaWdodGVkUk9DKHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwLCBsYWJlbF90ZXN0LCB3ZWlnaHRfdGVzdClcbmF1Y19yZiA8LSBXZWlnaHRlZEFVQyh0cHIuZnByKVxuY2F0KFwiVGhlIEFVQyBvZiBtb2RlbCBhZnRlciByZXdlaWdodGluZzogUkZcIiwgXCJpc1wiLCBhdWNfcmYsIFwiLlxcblwiKVxuY2F0KFwiVGhlIGFjY3VyYWN5IG9mIG1vZGVsOiBSYW5kb20gRm9yZXN0IG9uIGltYmFsYW5jZWQgdGVzdGluZyBkYXRhXCIsIFwiaXNcIiwgYWNjdV9yZl90ZXN0KjEwMCwgXCIlLlxcblwiKVxuY2F0KFwiVGhlIGFjY3VyYWN5IG9mIG1vZGVsOiBSYW5kb20gRm9yZXN0IG9uIGJhbGFuY2VkIHRlc3RpbmcgZGF0YVwiLCBcImlzXCIsIGFjY3VfcmYqMTAwLCBcIiUuXFxuXCIpXG5jYXQoXCJUaW1lIGZvciB0cmFpbmluZyBtb2RlbCBSYW5kb20gRm9yZXN0ID0gXCIsIHRpbWUucmYudHJhaW5bMV0sIFwicyBcXG5cIilcbmNhdChcIlRpbWUgZm9yIHRlc3RpbmcgbW9kZWwgUmFuZG9tIEZvcmVzdCA9IFwiLHRpbWUucmYudGVzdFsxXSwgXCJzIFxcblwiKVxufVxuI1RyYWluIDI1MDBcbmlmKHR1bmUucmFuZG9tLmZvcmVzdCl7XG50aW1lLnJmLnRyYWluIDwtIHN5c3RlbS50aW1lKHJhbmRvbV9mb3Jlc3RfZml0XzI1MDAgPC0gcmFuZG9tX2ZvcmVzdF90cmFpbl8yNTAwKGRhdF90cmFpbl9iYWxhbmNlZF9yb3NlLG10cnkgPSAxNTQpKVxuc2F2ZShyYW5kb21fZm9yZXN0X2ZpdF8yNTAwLCBmaWxlID0gXCIuLi9vdXRwdXQvcmZfdHJhaW5fMjUwMF90cmVlcy5SRGF0YVwiKVxufVxuI1Rlc3QgMjUwMFxucmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXA9TkFcbmlmKHR1bmUucmFuZG9tLmZvcmVzdCl7XG4gbG9hZChmaWxlPVwiLi4vb3V0cHV0L3JmX3RyYWluXzI1MDBfdHJlZXMuUkRhdGFcIilcbiB0aW1lLnJmLnRlc3QgPC0gc3lzdGVtLnRpbWUoXG4gICByYW5kb21fZm9yZXN0X3Rlc3RfcHJlcCA8LSByYW5kb21fZm9yZXN0X3Rlc3QoXG4gICAgIG1vZGVsID0gcmFuZG9tX2ZvcmVzdF9maXRfMjUwMCx0ZXN0c2V0ID0gZGF0X3Rlc3QpXG4gICAgICAgICAgICAgICApXG5cbnJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwIDwtIGFzLm51bWVyaWMoYXMuY2hhcmFjdGVyKHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwKSlcbmFjY3VfcmZfdGVzdCA8LSBtZWFuKHJhbmRvbV9mb3Jlc3RfdGVzdF9wcmVwID09IGRhdF90ZXN0JGxhYmVsKVxucmFuZG9tX2ZvcmVzdF9sYWJlbDwtcm91bmQocmFuZG9tX2ZvcmVzdF90ZXN0X3ByZXApXG5hY2N1X3JmIDwtIHN1bSh3ZWlnaHRfdGVzdCAqIChyYW5kb21fZm9yZXN0X2xhYmVsID09IGxhYmVsX3Rlc3QpKSAvIHN1bSh3ZWlnaHRfdGVzdClcbiNwcm9iX3ByZWQgPC0gbGFibGVfcHJlZFxudHByLmZwciA8LSBXZWlnaHRlZFJPQyhyYW5kb21fZm9yZXN0X3Rlc3RfcHJlcCwgbGFiZWxfdGVzdCwgd2VpZ2h0X3Rlc3QpXG5hdWNfcmYgPC0gV2VpZ2h0ZWRBVUModHByLmZwcilcbmNhdChcIlRoZSBBVUMgb2YgbW9kZWwgYWZ0ZXIgcmV3ZWlnaHRpbmc6IFJGXCIsIFwiaXNcIiwgYXVjX3JmLCBcIi5cXG5cIilcbmNhdChcIlRoZSBhY2N1cmFjeSBvZiBtb2RlbDogUmFuZG9tIEZvcmVzdCBvbiBpbWJhbGFuY2VkIHRlc3RpbmcgZGF0YVwiLCBcImlzXCIsIGFjY3VfcmZfdGVzdCoxMDAsIFwiJS5cXG5cIilcbmNhdChcIlRoZSBhY2N1cmFjeSBvZiBtb2RlbDogUmFuZG9tIEZvcmVzdCBvbiBiYWxhbmNlZCB0ZXN0aW5nIGRhdGFcIiwgXCJpc1wiLCBhY2N1X3JmKjEwMCwgXCIlLlxcblwiKVxuY2F0KFwiVGltZSBmb3IgdHJhaW5pbmcgbW9kZWwgUmFuZG9tIEZvcmVzdCA9IFwiLCB0aW1lLnJmLnRyYWluWzFdLCBcInMgXFxuXCIpXG5jYXQoXCJUaW1lIGZvciB0ZXN0aW5nIG1vZGVsIFJhbmRvbSBGb3Jlc3QgPSBcIix0aW1lLnJmLnRlc3RbMV0sIFwicyBcXG5cIilcbn1cblxuYGBgIn0= -->
```r
source("../lib/random_forest.R")
#Train 500
if(tune.random.forest){
time.rf.train <- system.time(random_forest_fit_500 <- random_forest_train_500(dat_train_balanced_rose,mtry = 154))
save(random_forest_fit_500, file = "../output/rf_train_500_trees.RData")
}
#Test 500
random_forest_test_prep=NA
if(tune.random.forest){
load(file="../output/rf_train_500_trees.RData")
time.rf.test <- system.time(
random_forest_test_prep <- random_forest_test(
model = random_forest_fit_500,testset = dat_test)
)
random_forest_test_prep <- as.numeric(as.character(random_forest_test_prep))
accu_rf_test <- mean(random_forest_test_prep == dat_test$label)
random_forest_label<-round(random_forest_test_prep)
accu_rf <- sum(weight_test * (random_forest_label == label_test)) / sum(weight_test)
#prob_pred <- lable_pred
tpr.fpr <- WeightedROC(random_forest_test_prep, label_test, weight_test)
auc_rf <- WeightedAUC(tpr.fpr)
cat("The AUC of model after reweighting: RF", "is", auc_rf, ".\n")
cat("The accuracy of model: Random Forest on imbalanced testing data", "is", accu_rf_test*100, "%.\n")
cat("The accuracy of model: Random Forest on balanced testing data", "is", accu_rf*100, "%.\n")
cat("Time for training model Random Forest = ", time.rf.train[1], "s \n")
cat("Time for testing model Random Forest = ",time.rf.test[1], "s \n")
}
# The AUC of model after reweighting: RF is 0.5031999 .
# The accuracy of model: Random Forest on imbalanced testing data is 80.33333 %.
# The accuracy of model: Random Forest on balanced testing data is 50.31999 %.
# Time for training model Random Forest = 20.95 s
# Time for testing model Random Forest = 0.09 s
#Train 1000
if(tune.random.forest){
time.rf.train <- system.time(random_forest_fit_1000 <- random_forest_train_1000(dat_train_balanced_rose,mtry = 154))
save(random_forest_fit_1000, file = "../output/rf_train_1000_trees.RData")
}
#Test 1000
random_forest_test_prep=NA
if(tune.random.forest){
load(file="../output/rf_train_1000_trees.RData")
time.rf.test <- system.time(
random_forest_test_prep <- random_forest_test(
model = random_forest_fit_1000,testset = dat_test)
)
random_forest_test_prep <- as.numeric(as.character(random_forest_test_prep))
accu_rf_test <- mean(random_forest_test_prep == dat_test$label)
random_forest_label<-round(random_forest_test_prep)
accu_rf <- sum(weight_test * (random_forest_label == label_test)) / sum(weight_test)
#prob_pred <- lable_pred
tpr.fpr <- WeightedROC(random_forest_test_prep, label_test, weight_test)
auc_rf <- WeightedAUC(tpr.fpr)
cat("The AUC of model after reweighting: RF", "is", auc_rf, ".\n")
cat("The accuracy of model: Random Forest on imbalanced testing data", "is", accu_rf_test*100, "%.\n")
cat("The accuracy of model: Random Forest on balanced testing data", "is", accu_rf*100, "%.\n")
cat("Time for training model Random Forest = ", time.rf.train[1], "s \n")
cat("Time for testing model Random Forest = ",time.rf.test[1], "s \n")
}
#Train 1500
if(tune.random.forest){
time.rf.train <- system.time(random_forest_fit_1500 <- random_forest_train_1500(dat_train_balanced_rose,mtry = 154))
save(random_forest_fit_1500, file = "../output/rf_train_1500_trees.RData")
}
#Test 1500
random_forest_test_prep=NA
if(tune.random.forest){
load(file="../output/rf_train_1500_trees.RData")
time.rf.test <- system.time(
random_forest_test_prep <- random_forest_test(
model = random_forest_fit_1500,testset = dat_test)
)
random_forest_test_prep <- as.numeric(as.character(random_forest_test_prep))
accu_rf_test <- mean(random_forest_test_prep == dat_test$label)
random_forest_label<-round(random_forest_test_prep)
accu_rf <- sum(weight_test * (random_forest_label == label_test)) / sum(weight_test)
#prob_pred <- lable_pred
tpr.fpr <- WeightedROC(random_forest_test_prep, label_test, weight_test)
auc_rf <- WeightedAUC(tpr.fpr)
cat("The AUC of model after reweighting: RF", "is", auc_rf, ".\n")
cat("The accuracy of model: Random Forest on imbalanced testing data", "is", accu_rf_test*100, "%.\n")
cat("The accuracy of model: Random Forest on balanced testing data", "is", accu_rf*100, "%.\n")
cat("Time for training model Random Forest = ", time.rf.train[1], "s \n")
cat("Time for testing model Random Forest = ",time.rf.test[1], "s \n")
}
#Train 2000
if(tune.random.forest){
time.rf.train <- system.time(random_forest_fit_2000 <- random_forest_train_2000(dat_train_balanced_rose,mtry = 154))
save(random_forest_fit_2000, file = "../output/rf_train_2000_trees.RData")
}
#Test 2000
random_forest_test_prep=NA
if(tune.random.forest){
load(file="../output/rf_train_2000_trees.RData")
time.rf.test <- system.time(
random_forest_test_prep <- random_forest_test(
model = random_forest_fit_2000,testset = dat_test)
)
random_forest_test_prep <- as.numeric(as.character(random_forest_test_prep))
accu_rf_test <- mean(random_forest_test_prep == dat_test$label)
random_forest_label<-round(random_forest_test_prep)
accu_rf <- sum(weight_test * (random_forest_label == label_test)) / sum(weight_test)
#prob_pred <- lable_pred
tpr.fpr <- WeightedROC(random_forest_test_prep, label_test, weight_test)
auc_rf <- WeightedAUC(tpr.fpr)
cat("The AUC of model after reweighting: RF", "is", auc_rf, ".\n")
cat("The accuracy of model: Random Forest on imbalanced testing data", "is", accu_rf_test*100, "%.\n")
cat("The accuracy of model: Random Forest on balanced testing data", "is", accu_rf*100, "%.\n")
cat("Time for training model Random Forest = ", time.rf.train[1], "s \n")
cat("Time for testing model Random Forest = ",time.rf.test[1], "s \n")
}
#Train 2500
if(tune.random.forest){
time.rf.train <- system.time(random_forest_fit_2500 <- random_forest_train_2500(dat_train_balanced_rose,mtry = 154))
save(random_forest_fit_2500, file = "../output/rf_train_2500_trees.RData")
}
#Test 2500
random_forest_test_prep=NA
if(tune.random.forest){
load(file="../output/rf_train_2500_trees.RData")
time.rf.test <- system.time(
random_forest_test_prep <- random_forest_test(
model = random_forest_fit_2500,testset = dat_test)
)
random_forest_test_prep <- as.numeric(as.character(random_forest_test_prep))
accu_rf_test <- mean(random_forest_test_prep == dat_test$label)
random_forest_label<-round(random_forest_test_prep)
accu_rf <- sum(weight_test * (random_forest_label == label_test)) / sum(weight_test)
#prob_pred <- lable_pred
tpr.fpr <- WeightedROC(random_forest_test_prep, label_test, weight_test)
auc_rf <- WeightedAUC(tpr.fpr)
cat("The AUC of model after reweighting: RF", "is", auc_rf, ".\n")
cat("The accuracy of model: Random Forest on imbalanced testing data", "is", accu_rf_test*100, "%.\n")
cat("The accuracy of model: Random Forest on balanced testing data", "is", accu_rf*100, "%.\n")
cat("Time for training model Random Forest = ", time.rf.train[1], "s \n")
cat("Time for testing model Random Forest = ",time.rf.test[1], "s \n")
}
Testing Result: When trees = 500: The AUC of model after reweighting: RF is 0.5116745 . The accuracy of model: Random Forest on imbalanced testing data is 80.66667 %. The accuracy of model: Random Forest on balanced testing data is 51.16745 %. Time for training model Random Forest = 713.63 s Time for testing model Random Forest = 0.19 s
When trees = 1000 The AUC of model after reweighting: RF is 0.5201491 . The accuracy of model: Random Forest on imbalanced testing data is 81 %. The accuracy of model: Random Forest on balanced testing data is 52.01491 %. Time for training model Random Forest = 1367.94 s Time for testing model Random Forest = 0.28 s
When trees = 1500 The AUC of model after reweighting: RF is 0.5201491 . The accuracy of model: Random Forest on imbalanced testing data is 81 %. The accuracy of model: Random Forest on balanced testing data is 52.01491 %. Time for training model Random Forest = 2077.56 s Time for testing model Random Forest = 0.36 s
When trees = 2000 The AUC of model after reweighting: RF is 0.5201491 . The accuracy of model: Random Forest on imbalanced testing data is 81 %. The accuracy of model: Random Forest on balanced testing data is 52.01491 %. Time for training model Random Forest = 3142.77 s Time for testing model Random Forest = 0.56 s
When trees = 2500 The AUC of model after reweighting: RF is 0.5159118 . The accuracy of model: Random Forest on imbalanced testing data is 80.83333 %. The accuracy of model: Random Forest on balanced testing data is 51.59118 %. Time for training model Random Forest = 3963.67 s Time for testing model Random Forest = 0.62 s
Therefore, we should use trees = 1000.
source("../lib/random_forest.R")
if(train.random.forest){
time.rf.train <- system.time(random_forest_fit <- random_forest_train(dat_train_balanced_rose,mtry = 154))
save(random_forest_fit, file = "../output/random_forest_train.RData")
save(time.rf.train,file = "../output/random_forest_train_time.RData")
}else{
load(file = "../output/random_forest_train_time.RData")
load(file = "../output/random_forest_train.RData")
}
random_forest_test_prep=NA
if(run.test){
load(file="../output/random_forest_train.RData")
time.rf.test <- system.time(
random_forest_test_prep <- random_forest_test(
model = random_forest_fit,testset = dat_test)
)
}
## reweight the test data to represent a balanced label distribution
if (run.gbm){
accu <- mean(dat_test$label == label_pred)
cat("The accuracy of GBM baseline model is", mean(dat_test$label == label_pred)*100, "%.\n")
} else {
label_test <- as.integer(dat_test$label)
weight_test <- rep(NA, length(label_test))
for (v in unique(label_test)){
weight_test[label_test == v] = 0.5 * length(label_test) / length(label_test[label_test == v])
}
accu <- sum(weight_test * (label_pred == label_test)) / sum(weight_test)
tpr.fpr <- WeightedROC(prob_pred, label_test, weight_test)
auc <- WeightedAUC(tpr.fpr)
cat("The accuracy of model:", model_labels[which.min(res_cv$mean_error)], "is", accu*100, "%.\n")
cat("The AUC of model:", model_labels[which.min(res_cv$mean_error)], "is", auc, ".\n")
}
random_forest_test_prep <- as.numeric(as.character(random_forest_test_prep))
accu_rf_test <- mean(random_forest_test_prep == dat_test$label)
random_forest_label<-round(random_forest_test_prep)
#prob_pred <- lable_pred
tpr.fpr <- WeightedROC(random_forest_test_prep, label_test, weight_test)
auc_rf <- WeightedAUC(tpr.fpr)
cat("The AUC of model after reweighting: RF", "is", auc_rf, ".\n")
cat("The accuracy of model: Random Forest on testing data", "is", accu_rf_test*100, "%.\n")
cat("Time for training model Random Forest = ", time.rf.train[1], "s \n")
cat("Time for testing model Random Forest = ",time.rf.test[1], "s \n")
#label_test
cat("The accuracy of model:", model_labels[which.min(res_cv$mean_error)], "is", accu*100, "%.\n")
cat("The AUC of model:", model_labels[which.min(res_cv$mean_error)], "is", auc, ".\n")
Prediction performance matters, so does the running times for constructing features and for training the model, especially when the computation resource is limited.
# cat("Time for constructing training features=", tm_feature_train[1], "s \n")
# cat("Time for constructing testing features=", tm_feature_test[1], "s \n")
# cat("Time for training model=", tm_train[1], "s \n")
# cat("Time for testing model=", tm_test[1], "s \n")
best.cost$best.parameters$cost
[1] 0.01
tm_svm_rebalanced_test <- NA
if(needs.balanced){
tm_svm_rebalanced_test <- system.time(svm_testing_data <- ROSE(label ~ ., data = dat_test)$data)
save(svm_testing_data, file="../output/svm_testing_data.RData")
} else {
load(file="../output/svm_testing_data.RData")
}
tm_svm_test <- system.time(svm_pred <- svm_test(svm_linear_mod, svm_testing_data))
svm_accu = mean(round(svm_pred == svm_testing_data$label))
tpr.fpr <- WeightedROC(as.numeric(svm_pred), svm_testing_data$label)
svm_auc = WeightedAUC(tpr.fpr)
cat("The accuracy of svm model is", svm_accu*100, "%.\n")
cat("The AUC of svm model is", svm_auc, ".\n")
cat("Time for rebalancing training data =", tm_svm_rebalanced_train[1], "s \n")
cat("Time for rebalancing testing data =", tm_svm_rebalanced_test[1], "s \n")
cat("Time for training svm model =", tm_svm_linear_mod[1], "s \n")
cat("Time for testing svm model=", tm_svm_test[1], "s \n")
###Reference - Du, S., Tao, Y., & Martinez, A. M. (2014). Compound facial expressions of emotion. Proceedings of the National Academy of Sciences, 111(15), E1454-E1462.